AITopics | react agent

Collaborating Authors

react agent

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

OmniReflect: Discovering Transferable Constitutions for LLM agents via Neuro-Symbolic Reflections

Bharadwaj, Manasa, Verma, Nikhil, Ferreira, Kevin

arXiv.org Artificial IntelligenceJun-24-2025

Efforts to improve Large Language Model (LLM) agent performance on complex tasks have largely focused on fine-tuning and iterative self-correction. However, these approaches often lack generalizable mechanisms for longterm learning and remain inefficient in dynamic environments. We introduce OmniReflect, a hierarchical, reflection-driven framework that constructs a constitution, a compact set of guiding principles distilled from task experiences, to enhance the effectiveness and efficiency of an LLM agent. OmniReflect operates in two modes: Self-sustaining, where a single agent periodically curates its own reflections during task execution, and Co-operative, where a Meta-advisor derives a constitution from a small calibration set to guide another agent. To construct these constitutional principles, we employ Neural, Symbolic, and NeuroSymbolic techniques, offering a balance between contextual adaptability and computational efficiency. Empirical results averaged across models show major improvements in task success, with absolute gains of +10.3% on ALFWorld, +23.8% on BabyAI, and +8.3% on PDDL in the Self-sustaining mode. Similar gains are seen in the Co-operative mode, where a lightweight Qwen3-4B ReAct agent outperforms all Reflexion baselines on BabyAI. These findings highlight the robustness and effectiveness of OmniReflect across environments and backbones.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.17449

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep Research Bench: Evaluating AI Web Research Agents

FutureSearch, null, :, null, Bosse, Nikos I., Evans, Jon, Gambee, Robert G., Hnyk, Daniel, Mühlbacher, Peter, Phillips, Lawrence, Schwarz, Dan, Wildman, Jack

arXiv.org Artificial IntelligenceJun-10-2025

Amongst the most common use cases of modern AI is LLM chat with web search enabled. However, no direct evaluations of the quality of web research agents exist that control for the continually-changing web. We introduce Deep Research Bench, consisting of 89 multi-step web research task instances of varying difficulty across 8 diverse task categories, with the answers carefully worked out by skilled humans. We provide a "RetroSearch" environment with a large frozen set of scraped web pages, and demonstrate that offline "RetroSearch" agents perform comparably to "live web" agents, enabling reliable evaluations of models over time. We provide robust agent tooling and scaffolding to benchmark major LLMs as they are released, including "thinking" models like o3 and Gemini 2.5 Pro. We include automated evaluations of the lengthy agent traces to report progress over time in hallucinations, tool use, and forgetting. Finally, we evaluate the major web research products branded as "Deep Research", "Deep Search", "Search", or "Research." Results are available on a public leaderboard at https://drb.futuresearch.ai/.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2506.06287

Country:

Asia > China (0.04)
North America > United States > Massachusetts (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Government (1.00)
Energy (1.00)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Towards a Realistic Long-Term Benchmark for Open-Web Research Agents

Mühlbacher, Peter, Bosse, Nikos I., Phillips, Lawrence

arXiv.org Artificial IntelligenceSep-25-2024

We present initial results of a forthcoming benchmark for evaluating LLM agents on white-collar tasks of economic value. We evaluate agents on real-world "messy" open-web research tasks of the type that are routine in finance and consulting. In doing so, we lay the groundwork for an LLM agent evaluation suite where good performance directly corresponds to a large economic and societal impact. We built and tested several agent architectures with o1-preview, GPT-4o, Claude-3.5 Sonnet, Llama 3.1 (405b), and GPT-4o-mini. On average, LLM agents powered by Claude-3.5 Sonnet and o1-preview substantially outperformed agents using GPT-4o, with agents based on Llama 3.1 (405b) and GPT-4o-mini lagging noticeably behind. Across LLMs, a ReAct architecture with the ability to delegate subtasks to subagents performed best. In addition to quantitative evaluations, we qualitatively assessed the performance of the LLM agents by inspecting their traces and reflecting on their observations. Our evaluation represents the first in-depth assessment of agents' abilities to conduct challenging, economically valuable analyst-style research on the real open web.

agent, planning agent, react agent, (16 more...)

arXiv.org Artificial Intelligence

2409.14913

Country:

Asia > Russia (0.93)
Asia > China (0.70)
North America > United States (0.28)
(3 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Services (1.00)
Government > Military (1.00)
Banking & Finance > Trading (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

HAMMR: HierArchical MultiModal React agents for generic VQA

Castrejon, Lluis, Mensink, Thomas, Zhou, Howard, Ferrari, Vittorio, Araujo, Andre, Uijlings, Jasper

arXiv.org Artificial IntelligenceApr-8-2024

Combining Large Language Models (LLMs) with external specialized tools (LLMs+tools) is a recent paradigm to solve multimodal tasks such as Visual Question Answering (VQA). While this approach was demonstrated to work well when optimized and evaluated for each individual benchmark, in practice it is crucial for the next generation of real-world AI systems to handle a broad range of multimodal problems. Therefore we pose the VQA problem from a unified perspective and evaluate a single system on a varied suite of VQA tasks including counting, spatial reasoning, OCR-based reasoning, visual pointing, external knowledge, and more. In this setting, we demonstrate that naively applying the LLM+tools approach using the combined set of all tools leads to poor results. This motivates us to introduce HAMMR: HierArchical Multi-Modal React. We start from a multimodal ReAct-based [55] system and make it hierarchical by enabling our HAMMR agents to call upon other specialized agents. This enhances the compositionality of the LLM+tools approach, which we show to be critical for obtaining high accuracy on generic VQA. Concretely, on our generic VQA suite, HAMMR outperforms the naive LLM+tools approach by 19.5%. Additionally, HAMMR achieves state-of-the-art results on this task, outperforming the generic standalone PaLI-X VQA model [10] by 5.0%.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2404.05465

Country:

North America > United States > New York (0.04)
Africa > Middle East > Egypt (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

ReACT: Reinforcement Learning for Controller Parametrization using B-Spline Geometries

Rudolf, Thomas, Flögel, Daniel, Schürmann, Tobias, Süß, Simon, Schwab, Stefan, Hohmann, Sören

arXiv.org Artificial IntelligenceJan-10-2024

Robust and performant controllers are essential for industrial applications. However, deriving controller parameters for complex and nonlinear systems is challenging and time-consuming. To facilitate automatic controller parametrization, this work presents a novel approach using deep reinforcement learning (DRL) with N-dimensional B-spline geometries (BSGs). We focus on the control of parameter-variant systems, a class of systems with complex behavior which depends on the operating conditions. For this system class, gain-scheduling control structures are widely used in applications across industries due to well-known design principles. Facilitating the expensive controller parametrization task regarding these control structures, we deploy an DRL agent. Based on control system observations, the agent autonomously decides how to adapt the controller parameters. We make the adaptation process more efficient by introducing BSGs to map the controller parameters which may depend on numerous operating conditions. To preprocess time-series data and extract a fixed-length feature vector, we use a long short-term memory (LSTM) neural networks. Furthermore, this work contributes actor regularizations that are relevant to real-world environments which differ from training. Accordingly, we apply dropout layer normalization to the actor and critic networks of the truncated quantile critic (TQC) algorithm. To show our approach's working principle and effectiveness, we train and evaluate the DRL agent on the parametrization task of an industrial control structure with parameter lookup tables.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2401.05251

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
North America > United States > Massachusetts (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)

Add feedback